23 research outputs found

    EW-Tune: A Framework for Privately Fine-Tuning Large Language Models with Differential Privacy

    Full text link
    Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex AI tasks. Major AI companies with expensive infrastructures are able to develop and train these large models with billions and millions of parameters from scratch. Third parties, researchers, and practitioners are increasingly adopting these pre-trained models and fine-tuning them on their private data to accomplish their downstream AI tasks. However, it has been shown that an adversary can extract/reconstruct the exact training samples from these LLMs, which can lead to revealing personally identifiable information. The issue has raised deep concerns about the privacy of LLMs. Differential privacy (DP) provides a rigorous framework that allows adding noise in the process of training or fine-tuning LLMs such that extracting the training data becomes infeasible (i.e., with a cryptographically small success probability). While the theoretical privacy guarantees offered in most extant studies assume learning models from scratch through many training iterations in an asymptotic setting, this assumption does not hold in fine-tuning scenarios in which the number of training iterations is significantly smaller. To address the gap, we present \ewtune, a DP framework for fine-tuning LLMs based on Edgeworth accountant with finite-sample privacy guarantees. Our results across four well-established natural language understanding (NLU) tasks show that while \ewtune~adds privacy guarantees to LLM fine-tuning process, it directly contributes to decreasing the induced noise to up to 5.6\% and improves the state-of-the-art LLMs performance by up to 1.1\% across all NLU tasks. We have open-sourced our implementations for wide adoption and public testing purposes.Comment: Accepted at IEEE ICDM Workshop on Machine Learning for Cybersecurity (MLC) 202

    Efficient Secure Aggregation for Privacy-Preserving Federated Machine Learning

    Full text link
    Federated learning introduces a novel approach to training machine learning (ML) models on distributed data while preserving user's data privacy. This is done by distributing the model to clients to perform training on their local data and computing the final model at a central server. To prevent any data leakage from the local model updates, various works with focus on secure aggregation for privacy preserving federated learning have been proposed. Despite their merits, most of the existing protocols still incur high communication and computation overhead on the participating entities and might not be optimized to efficiently handle the large update vectors for ML models. In this paper, we present E-seaML, a novel secure aggregation protocol with high communication and computation efficiency. E-seaML only requires one round of communication in the aggregation phase and it is up to 318x and 1224x faster for the user and the server (respectively) as compared to its most efficient counterpart. E-seaML also allows for efficiently verifying the integrity of the final model by allowing the aggregation server to generate a proof of honest aggregation for the participating users. This high efficiency and versatility is achieved by extending (and weakening) the assumption of the existing works on the set of honest parties (i.e., users) to a set of assisting nodes. Therefore, we assume a set of assisting nodes which assist the aggregation server in the aggregation process. We also discuss, given the minimal computation and communication overhead on the assisting nodes, how one could assume a set of rotating users to as assisting nodes in each iteration. We provide the open-sourced implementation of E-seaML for public verifiability and testing

    Unsupervised Threat Hunting using Continuous Bag of Terms and Time (CBoTT)

    Get PDF
    Threat hunting is sifting through system logs to detect malicious activities that might have bypassed existing security measures. It can be performed in several ways, one of which is based on detecting anomalies. We propose an unsupervised framework, called continuous bag-of-terms-and-time (CBoTT), and publish its application programming interface (API) to help researchers and cybersecurity analysts perform anomaly-based threat hunting among SIEM logs geared toward process auditing on endpoint devices. Analyses show that our framework consistently outperforms benchmark approaches. When logs are sorted by likelihood of being an anomaly (from most likely to least), our approach identifies anomalies at higher percentiles (between 1.82-6.46) while benchmark approaches identify the same anomalies at lower percentiles (between 3.25-80.92). This framework can be used by other researchers to conduct benchmark analyses and cybersecurity analysts to find anomalies in SIEM logs

    MUSES: Efficient Multi-User Searchable Encrypted Database

    Get PDF
    Searchable encrypted systems enable privacy-preserving keyword search on encrypted data. Symmetric Searchable Encryption (SSE) achieves high security (e.g., forward privacy) and efficiency (i.e., sublinear search), but it only supports single-user. Public Key Searchable Encryption (PEKS) supports multi-user settings, however, it suffers from inherent security limitations such as being vulnerable to keyword-guessing attacks and the lack of forward privacy. Recent work has combined SSE and PEKS to achieve the best of both worlds: support multi-user settings, provide forward privacy while having sublinear complexity. However, despite their elegant design, the existing hybrid scheme inherits some of the security limitations of the underlying paradigms (e.g., patterns leakage, keyword-guessing) and might not be suitable for certain applications due to costly public-key operations (e.g., bilinear pairing). In this paper, we propose MUSES, a new multi-user encrypted search scheme that addresses the limitations in the existing hybrid design, while offering user efficiency. Specifically, MUSES permits multi-user functionalities (reader/writer separation, permission revocation), prevents keyword-guessing attacks, protects search/result patterns, achieves forward/backward privacy, and features minimal user overhead. In MUSES, we demonstrate a unique incorporation of various state-of-the-art distributed cryptographic protocols including Distributed Point Function, Distributed PRF, and Secret-Shared Shuffle. We also introduce a new oblivious shuffle protocol for the general -party setting with dishonest majority, which can be of independent interest. Our experimental results indicated that the keyword search in our scheme is two orders of magnitude faster with 13× lower user bandwidth overhead than the state-of-the-art

    Compact Energy and Delay-Aware Authentication

    Get PDF
    Authentication and integrity are fundamental security services that are critical for any viable system. However, some of the emerging systems (e.g., smart grids, aerial drones) are delay-sensitive, and therefore their safe and reliable operation requires delay-aware authentication mechanisms. Unfortunately, the current state-of-the-art authentication mechanisms either incur heavy computations or lack scalability for such large and distributed systems. Hence, there is a crucial need for digital signature schemes that can satisfy the requirements of delay-aware applications. In this paper, we propose a new digital signature scheme that we refer to as Compact Energy and Delay-aware Authentication (CEDA). In CEDA, signature generation and verification only require a small-constant number of multiplications and Pseudo Random Function (PRF) calls. Therefore, it achieves the lowest end-to-end delay among its counterparts. Our implementation results on an ARM processor and commodity hardware show that CEDA has the most efficient signature generation on both platforms, while offering a fast signature verification. Among its delay-aware counterparts, CEDA has a smaller private key with a constant-size signature. All these advantages are achieved with the cost of a larger public key. This is a highly favorable trade-off for applications wherein the verifier is not memory-limited. We open-sourced our implementation of CEDA to enable its broad testing and adaptation

    Lattice-Based Public Key Searchable Encryption from Experimental Perspectives

    Get PDF
    Public key Encryption with Keyword Search (PEKS) aims in mitigating the impacts of data privacy versus utilization dilemma by allowing {\em any user in the system} to send encrypted files to the server to be searched by a receiver. The receiver can retrieve the encrypted files containing specific keywords by providing the corresponding trapdoors of these keywords to the server. Despite their merits, the existing PEKS schemes introduce a high end-to-end delay that may hinder their adoption in practice. Moreover, they do not scale well for large security parameters and provide no post-quantum security promises. In this paper, we propose two novel lattice-based PEKS schemes that offer a high computational efficiency along with better security assurances than that of the existing alternatives. Specifically, our NTRU-PEKS scheme achieves 18 times lower end-to-end delay than the most efficient pairing-based alternatives. Our LWE-PEKS offers provable security in the standard model with a reduction to the worst-case lattice problems. We fully implemented our NTRU-PEKS scheme and benchmarked its performance as deployed on Amazon Web Services cloud infrastructures

    TACHYON: Fast Signatures from Compact Knapsack

    Get PDF
    We introduce a simple, yet efficient digital signature scheme which offers post-quantum security promise. Our scheme, named TACHYON\texttt{TACHYON}, is based on a novel approach for extending one-time hash-based signatures to (polynomially bounded) many-time signatures, using the additively homomorphic properties of generalized compact knapsack functions. Our design permits TACHYON\texttt{TACHYON} to achieve several key properties. First, its signing and verification algorithms are the fastest among its current counterparts with a higher level of security. This allows TACHYON\texttt{TACHYON} to achieve the lowest end-to-end delay among its counterparts, while also making it suitable for resource-limited signers. Second, its private keys can be as small as κ\kappa bits, where κ\kappa is the desired security level. Third, unlike most of its lattice-based counterparts, TACHYON\texttt{TACHYON} does not require any Gaussian sampling during signing, and therefore, is free from side-channel attacks targeting this process. We also explore various speed and storage trade-offs for TACHYON\texttt{TACHYON}, thanks to its highly tunable parameters. Some of these trade-offs can speed up TACHYON\texttt{TACHYON} signing in exchange for larger keys, thereby permitting TACHYON\texttt{TACHYON} to further improve its end-to-end delay

    Efficient Post-Quantum and Compact Cryptographic Constructions for the Internet of Things

    No full text
    IoT systems often rely on low-end devices to send measurements to other parties and depending on the setting, unauthorized alteration and/or privacy violation of these measures can have catastrophic consequences (e.g., embedded medical sensors). Therefore, providing efficient authentication, integrity, and confidentiality in these settings is vital. While conventional cryptographic measures (e.g., ECDSA) can be used to meet these security requirements, despite their elegant design, they are often too computationally expensive for low-end devices. This is further exacerbated when security against quantum computers is taken into the account. In this dissertation, we propose a series of new efficient conventional and post-quantum cryptographic schemes to meet the stringent requirement of such IoT systems. In the line of proposing efficient authentication schemes, we propose two signature schemes. Our first signature scheme is based on conventional cryptographic problems and utilizes the message encoding with cover-free families and special property of ECDLP-based functions to achieve significant performance gain as compared to its counterparts. The second scheme is based on post-quantum primitives and is achieved by extending one-time signatures to (polynomially bounded) many-time signatures, using the additively homomorphic properties of generalized compact knapsack functions. The new scheme achieves the lowest end-to-end delay among its counterparts which makes it suitable for low-end devices. As a step toward a fully post-quantum blockchain, we propose a Proof of Work (PoW) protocol that minimizes the advantage of a quantum miner. Our new protocol is based on the Hermite Shortest Vector Problem (Hermite-SVP) in the Euclidean norm and allows for a fast verify algorithm. To alleviate the hurdle of certificate communication and verification for low-end devices, we then present an identity-based and certificateless cryptosystems that are created using special key generation algorithms that harness the additive homomorphic property of the exponents to enable the users to incorporate their private keys into the one provided by the trusted third party without falsifying it. The new schemes achieve better computation efficiency and comparable communication efficiency as compared to their identity-based and certificateless counterparts. Lastly, with the aim of proposing efficient and highly-secure measures for secure remote data storage, we propose two lattice-based public key searchable encryption schemes with post-quantum security. To our knowledge, our schemes are the first instances of such schemes based on lattices that provide a post-quantum promise. Our first variant is based on NTRU lattices and provides a significant performance advantage and better end-to-end delay as compared to its existing counterparts. The second scheme, based on the LWE problem in the standard model, provides a better security as compared to its counterparts with a cost of an inferior performance. All of the proposed schemes are proven secure via rigorous security proofs and are implemented and open-sourced to allow for public testing and verification
    corecore